Scalars and Vectors

StartR Workshop

Maik Bieleke, PhD

University of Konstanz

November 23, 2024

Scalars

What are scalars?

Scalars are the simplest object in R. They are single values that can be assigned to a variable.

The assignment operator <- is used for creating objects like scalars.

scalar <- ...

The most common types of scalars are numerical and character.

  • numerical scalars

    x <- 10
    x
    [1] 10
    y <- 3 / 100
    y
    [1] 0.03
    z <- (x + y) / y
    z
    [1] 334.3333
  • character scalars

    a <- "Hello"
    a
    [1] "Hello"
    b <- "12345"
    b
    [1] "12345"
    c <- "Hello World!"
    c
    [1] "Hello World!"

Types of Scalars

R treats different types of scalars differently. For example, you can add two numerical values, but you cannot add two character values.

# Adding two numerical values: works fine
a <- 1
b <- 2
a + b
[1] 3
# Adding two character values: gives an error
a <- "1"
b <- "2"
a + b
Error in a + b: non-numeric argument to binary operator

Mathematical functions

R has many built-in mathematical functions that can be applied to scalars.

Function Description Example
abs(x) absolute value of x abs(-4) = 4
sqrt(x) square root of x sqrt(25) = 5
ceiling(x), floor(x) smallest integer not less (greater) than x ceiling(3.475) = 4
trunc(x) integer formed by truncating values in x toward 0 trunc(5.99) = 5
round(x, n) round x to n decimal places round(3.475, 2) = 3.48
signif(x, n) round x to n significant digits signif(3.475, 2) = 3.5
cos(x), sin(x), tan(x) trigonometric functions cos(2) = -0.416
log(x, n) Logarithm of x to the base n log(5, 2) = 2.32
log(x), log10(x) natural and common logarithm log(10) = 2.3026
exp(x) exponential function exp(2.3026) = 10
x %% y x modulo y (remainder of x divided by y) 7 %% 3 = 1

Note that these functions can also be applied to vectors, in which case they will be applied elementwise.

Vectors

Basics

What are vectors?

Vectors can be thought of as a collection of scalars. They are the most common data type in R.

The c() function is used for creating vectors.

vector <- c(...)

  • Vectors from scalars

    # Combining scalars
    a <- c(1, 3, 5)
    a
    [1] 1 3 5
  • Vectors from vectors

    # Combining vectors
    a <- c("a", "b", "c")
    b <- c("d", "e")
    c <- c(a, b)
    c
    [1] "a" "b" "c" "d" "e"

Types of vectors

Analogous to scalars, vectors can only contain values of the same type. Different types will be coerced into the same type.

For numerical and character values, coercion will always result in character values.

a <- c(1, 2, 3, "a", "b", "c")
a
[1] "1" "2" "3" "a" "b" "c"

Length of vectors

The length of a vector is the number of elements it contains.

The length() function returns the number of elements in a vector.

  • Numerical scalar

    a <- 1
    length(a)
    [1] 1
  • Character scalar

    a <- "This is a sentence."
    length(a)
    [1] 1
  • Numerical vector

    a <- c(11, 2, 333, 4, 5555)
    length(a)
    [1] 5
  • Character vector

    a <- c("abc", "def", "geh")
    length(a)
    [1] 3

Regular vectors

Colon operator (:)

R has built-in operators and functions for creating regular sequences as vectors.

The colon operator (a:b) creates a numeric vector from a to b in steps of 1.

a:b

  • Counting up

    1:5
    [1] 1 2 3 4 5
  • Counting down

    5:1
    [1] 5 4 3 2 1
  • Decimal values

    5.5:10.5
    [1]  5.5  6.5  7.5  8.5  9.5 10.5
  • Excluding boundaries

    5.5:10
    [1] 5.5 6.5 7.5 8.5 9.5

Sequence function: seq()

Sometimes more flexibility is needed when creating a sequence of numbers than the colon operator can provide.

The seq() function creates a numeric vector from a to b with a desired number of steps or a desired length.

seq(from = a, to = b, by = step, length.out = length)

  • Steps

    seq(from = 1, to = 3, 
        by = 0.5)
    [1] 1.0 1.5 2.0 2.5 3.0
  • Length

    seq(from = 1, to = 10, 
        length.out = 4)
    [1]  1  4  7 10

Repetition function: rep()

We can also create vectors in which certain values are repeated.

The rep() function creates a numeric or characzter vector in which a scalar or vector is repeated a desired number of times or to a desired length.

rep(x, times = n, length.out = n, each = n)

  • Repeat x

    rep(x = 3, 
        times = 5)
    [1] 3 3 3 3 3
  • Repeat values of x

    rep(x = c(1, 2), 
        each = 2)
    [1] 1 1 2 2
  • Desired length

    rep(x = c("a", "b"), 
        length.out = 5)
    [1] "a" "b" "a" "b" "a"

Exercise ✏️

Photo courtesy of @polarmermaid

  1. Create the scalar object x with the value 10.

    Solution
    x <- 10
  2. What is the length of vector y defined as y <- 1:x?

    Solution
    y <- 1:x
    length(y)
  3. Create a vector z from 11 to 100 in steps of 2.

    Solution
    z <- seq(1, 100, by = 2)
  4. Create a vector yz with all elements of y and z combined.

    Solution
    yz <- c(y, z)

Indexing Vectors

Numerical indexing

Indexing with positive integers

Specify a scalar or a vector of positive integers corresponding to the values you want to extract.

The []operator is used for indexing vectors.

  • Extract a single value

    # Extract the 3rd value
    x <- c("a", "b", "c", "d", "e")
    x[3]
    [1] "c"
  • Extract several values

    # Extract the first three values
    x <- c("a", "b", "c", "d", "e")
    x[1:3]
    [1] "a" "b" "c"

Indexing with negative integers

Specify a scalar or a vector of negative integers corresponding to the values you want to exclude.

  • Exclude a single value

    # Exclude the 5th value
    x <- c("a", "b", "c", "d", "e")
    x[-5]
    [1] "a" "b" "c" "d"
  • Exclude several values

    # Exclude every second value
    x <- c("a", "b", "c", "d", "e")
    x[-seq(from = 1, to = 5, by = 2)]
    [1] "b" "d"

Logical indexing

Logical vectors

We already know numerical and character vectors. Logical vectors are the third type of vectors in R. They can only have the values TRUE and FALSE (or T and F also works).

# Create a logical vector manually
x <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
x
[1]  TRUE FALSE  TRUE FALSE  TRUE

Logical vectors are commonly created by applying logical operators to numerical or character vectors. One example is the equality operator == which returns TRUE if two values are equal and FALSE otherwise.

x <- 5 # set the value of x to 5
x == 5 # check whether x has the value 5
[1] TRUE

Logical operators

An overview of the most common logical operators:

a==b equal a>b greater than a>=b greater than or equal
a!=b not equal a<b less than a<=b smaller than or equal
a|b or !a not any(a) at least one
a&b and %in% in set all(a) everything

Examples

3 < 5 # is 3 smaller than 5?
[1] TRUE
5 <= 2 # is 5 smaller than or equal to 2?
[1] FALSE
!(3 == 5) # is 3 not equal to 5?
[1] TRUE
c(4, 5) %in% c(1, 2, 3, 4) # are 4 and 5 in the vector ?
[1]  TRUE FALSE
c("a", "b", "c", "d") == "c" # are the values of the vector equal to "c"?
[1] FALSE FALSE  TRUE FALSE
any(c("a", "b", "c", "d") == "c" ) # is at least one value equal to "c"?
[1] TRUE
(3 > 7) | (5 < 10) # is 3 greater than 7 OR 5 smaller than 10?
[1] TRUE
(3 > 7) & (5 < 10) # is 3 greater than 7 AND 5 smaller than 10?
[1] FALSE

Indexing with logical vectors

Examples

  • equality operator

    x <- c("a", "b", "c", "d", "e")
    x[x == "c"]
    [1] "c"
  • inequality operator

    a <- 1:5
    a[a < 3]
    [1] 1 2
  • negation operator

    a <- 1:5
    b <- 3
    a[!a == b]
    [1] 1 2 4 5
  • and-operator

    a <- 1:10
    a[a < 4 & a > 6]
    integer(0)
  • or-operator

    a <- 1:10
    a[a < 3 | a > 7]
    [1]  1  2  8  9 10
  • set-operator

    x <- c("a", "b", "c", "d", "e")
    y <- c("b", "g")
    y[y %in% x]
    [1] "b"

Value assignment

Changing vector values

Combine indexing and assignment to change values.

  • Change a single value

    # Define a new vector
    x <- c(1, 2, 3, 4, 5)
    
    # Change the 3rd value to 8
    x[3] <- 8
    x
    [1] 1 2 8 4 5
  • Change several values

    # Define a new vector
    x <- c(1, 2, 3, 4, 5)
    
    # Change the last two values to 7 and 9
    x[c(4, 5)] <- c(7, 9)
    x
    [1] 1 2 3 7 9

As always, we can also use logical indexing to change values.

x <- c("a", "b", "b", "c", "d", "b", "e", "f", "b", "b")

# Change all instances of letter "b" to letter "z"
x[x == "b"] <- "z"
x
 [1] "a" "z" "z" "c" "d" "z" "e" "f" "z" "z"

Operations on vectors

Elementwise operations

Operations on vectors are performed elementwise.

  • Adding a single value

    a <- c(1, 2, 3, 4, 5)
    a + 1
    [1] 2 3 4 5 6
  • Adding two vectors

    a <- c(1, 2, 3, 4, 5)
    b <- c(10, 20, 30, 40, 50)
    a + b
    [1] 11 22 33 44 55
  • Computing the square root

    b <- c(4, 9, 16)
    sqrt(b)
    [1] 2 3 4
  • Dividing by a single value

    d <- c(10, 100, 1000)
    d / 10
    [1]   1  10 100
  • Product of two vectors

    a <- c(10, 100, 1000)
    b <- c(2, 3, 4)
    a * b
    [1]   20  300 4000
  • Exponentiation of vectors

    a <- c(2, 3, 4)
    b <- c(2, 3, 4)
    a^b
    [1]   4  27 256

Reycling principle

An operation on two vectors of different lengths will recycle the shorter vector to match the length of the longer vector - without warning.

a <- c(1, 2, 3, 4, 5)
b <- c(0.5, 1.0)
a + b
[1] 1.5 3.0 3.5 5.0 5.5

This is also why operations with scalars work on vectors. They are recycled to the length of the vector.

a <- c(2, 3, 4)
a^2
[1]  4  9 16
b <- c(2, 2, 2)
a^b
[1]  4  9 16

Statistical functions

R has many built-in statistical functions that can be applied to vectors.

Function Description Example for x <- c(1, 2, 2, 5)
mean(x) mean mean(x) = 2.5
sum(x) sum sum(x) = 10
median(x) median median(x) = 2
sd(x) standard deviation sd(x) = 1.732051
var(x) variance var(x) = 3
range(x) range range(x) = 1 5
min(x) minimum min(x) = 1
max(x) maximum max(x) = 5

Other functions

Some other, non-statisitcal functions for vectors are:

Function Description Example for x <- c(3, 8, 8, 5)
sort(x) sorts the elements sort(x) = 3 5 8 8
rev(x) reversed order of elements rev(x) = 5 8 8 3
length(x) number of elements length(x) = 4
unique(x) unique elements unique(x) = 3 8 5

Exercise ✏️

Photo courtesy of @amadorloureiro

  1. letters is a predefined vector of the English alphabet in R. Use it to extract the 15th letter of the alphabet.

    Solution
    letters[15]
  2. Create a vector x with every second letter of the alphabet.

    Solution
    x <- letters[seq(1, 26, by = 2)]
  3. Use the ! (not) and the %in% (set) operators to remove the vowels from x and assign the resulting vector to y.

    Solution
    y <- x[!x %in% c("a", "e", "i", "o", "u")]
  4. Replace the lower-case letter “m” in y by the uper-case letter “M”.

    Solution
    y[y == "m"] <- "M"

Missings

What are missings?

Missings are values that are not available for some reason.

Missing values are represented by NA (= Not Available).

Missings can be treated like regular values.

  • Assign missings to vector

    a <- c(1, 2, NA, 4, 5)
    a
    [1]  1  2 NA  4  5
  • Replace missings manually

    a <- c(1, 2, NA, 4, 5)
    a[3] <- 9
    a
    [1] 1 2 9 4 5

Logical indexing can be used to identify and replace missings.

  • Find missings with is.na()

    a <- c(1, 2, NA, 4, 5)
    is.na(a)
    [1] FALSE FALSE  TRUE FALSE FALSE
  • Replace missings logically

    a <- c(1, 2, NA, 4, 5)
    a[is.na(a)] <- 9
    a
    [1] 1 2 9 4 5

Missings in functions

Many descriptive statistics functions return NA if the vector contains missings.

# Define a vector with missings
a <- c(5, 2, NA, 9, 2, NA, NA, 3)

mean(a)
[1] NA
sum(a)
[1] NA
range(a)
[1] NA NA

To avoid this, you can often set the argument na.rm to TRUE to remove missings before computing the statistic.

mean(a, na.rm = T)
[1] 4.2
sum(a, na.rm = T)
[1] 21
range(a, na.rm = T)
[1] 2 9